[Deepin-Kernel-SIG] [linux 6.18-y] [LoongArch] LoongArch Stage Patch Synchronization 260302#1569
Conversation
Enable this configuration to prevent PSI from working by default, thereby improving system performance Signed-off-by: zhangtianyang <zhangtianyang@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Link: deepin-community@b5a3b47 (cherry picked from commit cdf27e0) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Conflicts: arch/loongarch/configs/loongson3_defconfig
maillist inclusion category: bugfix Ths first EOP packet with a sequence number as seq-1 seems to confuse some PCIe hardware (e.g. Loongson 7A PCHs). Use the real sequence number instead. Fixes: a9c73a0 ("drm/radeon: workaround for CP HW bug on CIK") Link: https://lore.kernel.org/all/73597116d4f004c5f75cf4f13da1af405ea8da8b.camel@icenowy.me/ Link:deepin-community#1182 Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: lvjianmin <lvjianmin@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> (cherry picked from commit b74f17f220792adb1533d12cf0d9bcf227167b12) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
maillist inclusion category: bugfix The duplication of EOP packets for GFX7/8, with the former one have seq-1 written and the latter one have seq written, seems to confuse some hardware platform (e.g. Loongson 7A series PCIe controllers). Make the content of the duplicated EOP packet the same with the real one, only masking any possible interrupts. Fixes: bf26da9 ("drm/amdgpu: add cache flush workaround to gfx8 emit_fence") Fixes: a2e73f5 ("drm/amdgpu: Add support for CIK parts") Link: https://lore.kernel.org/all/20240617105846.1516006-3-uwu@icenowy.me/ Link: deepin-community#1182 Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: lvjianmin <lvjianmin@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> (cherry picked from commit 275fc45f08424bf0e33f22a042f798c9b76f2765) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
maillist inclusion category: bugfix Link: deepin-community#1182 Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> (cherry picked from commit 2364f257439b2126a2ebcf9e9a5a55c59784d84b) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> (cherry picked from commit bcaf362) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Conflicts: drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> (cherry picked from commit 89494b3) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Add the missing code when migrating this patch from 4.19 which can cause kernel deadlock. Fixes: bcaf362 (drm/amdgpu: Fix pcie order dislocation) Signed-off-by: wuqianhai <wuqianhai@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> (cherry picked from commit 98acedf) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Conflicts: drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
Fix the pointer error of wptr/rptr in ih_fix_loongarch_pcie_order(). Fixes: bcaf362 (drm/amdgpu: Fix pcie order dislocation) Signed-off-by: zhaotianrui <zhaotianrui@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> (cherry picked from commit c41045a) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Reviewer's GuideLoongArch-specific workarounds are added to AMDGPU/Radeon interrupt handling, ring fences, and ring sizing to address PCIe ordering/cache flush issues, with additional synchronization between interrupt handling and command submission and corresponding ring size accounting updates. Sequence diagram for LoongArch command submission waiting on IH fixsequenceDiagram
actor User as userspace_process
participant IOCTL as amdgpu_cs_submit
participant PARSER as amdgpu_cs_parser
participant ADEV as amdgpu_device
participant IH as amdgpu_ih_fix_is_busy
participant IRQ as amdgpu_irq
User->>IOCTL: submit_command(p)
IOCTL->>PARSER: access p->adev
loop while amdgpu_ih_fix_is_busy
IOCTL->>IH: amdgpu_ih_fix_is_busy(p->adev)
IH->>IRQ: atomic_read(adev->irq.cs_lock)
IRQ-->>IH: cs_lock value
alt cs_lock != 0
IH-->>IOCTL: busy
IOCTL->>IOCTL: msleep(20)
else
IH-->>IOCTL: not busy
end
end
IOCTL->>PARSER: for each job in gang, drm_sched_job_arm
IOCTL->>PARSER: drm_sched_entity_push_job
IOCTL-->>User: return submit result
Class diagram for updated AMDGPU LoongArch IH and IRQ structuresclassDiagram
class amdgpu_device {
+struct amdgpu_irq irq
+struct amdgpu_ih_ring *ih_rings
}
class amdgpu_irq {
+spinlock_t lock
+bool msi_enabled
+u32 srbm_soft_reset
+u32 retry_cam_doorbell_index
+bool retry_cam_enabled
+atomic_t cs_lock
}
class amdgpu_ih_ring {
+u32 *ring
+u32 rptr
+u32 wptr
+u32 ptr_mask
+dma_addr_t rptr_addr
+u32 *rptr_cpu
+u32 *wptr_cpu
+bool use_bus_addr
+wait_queue_head_t wait_process
+u64 processed_timestamp
+bool overflow
+atomic_t lock
+struct work_struct fix_work
+struct amdgpu_device *adev
+int ring_size
+bool enabled
}
class amdgpu_ih {
+int amdgpu_ih_ring_init(adev, ih, ring_size, use_bus_addr)
+void amdgpu_ih_ring_fini(adev, ih)
+int amdgpu_ih_process(adev, ih)
+u32 amdgpu_ih_get_wptr(adev, ih)
+void amdgpu_ih_set_rptr(adev, ih)
+int amdgpu_ih_fix_is_busy(adev)
+int amdgpu_ih_fix_loongarch_pcie_order_start(ih, rptr, wptr, forever)
+int amdgpu_ih_fix_loongarch_pcie_order_end(ih, rptr, wptr)
+void amdgpu_ih_handle_fix_work(work)
}
class amdgpu_cs_parser {
+struct amdgpu_device *adev
+unsigned int gang_size
+struct amdgpu_job **jobs
+int amdgpu_cs_submit(p)
}
amdgpu_device o-- amdgpu_irq : has
amdgpu_device o-- amdgpu_ih_ring : owns
amdgpu_ih_ring --> amdgpu_device : adev
amdgpu_ih ..> amdgpu_ih_ring : manages
amdgpu_ih ..> amdgpu_device : uses
amdgpu_cs_parser --> amdgpu_device : adev
amdgpu_cs_parser ..> amdgpu_ih : uses amdgpu_ih_fix_is_busy
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
390f7c5 to
7bd1ea5
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- There is an unresolved merge conflict in gfx_v11_0_ring_funcs_compute.emit_frame_size (<<<<<<< HEAD / ======= / >>>>>>>), which needs to be cleaned up before this can be merged.
- The LoongArch-specific wait loop in amdgpu_cs_submit() (
while (amdgpu_ih_fix_is_busy(p->adev)) msleep(20);) can block indefinitely; consider adding a timeout or a more explicit wait mechanism so userspace submissions cannot hang forever if the IH fix work gets stuck. - The LoongArch-specific multi-RELEASE_MEM fence emission (10x loop) is duplicated between GFX10 and GFX11 and significantly inflates packet counts; consider factoring this into a common helper to avoid divergence between implementations and keep the emit_frame_size accounting easier to maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There is an unresolved merge conflict in gfx_v11_0_ring_funcs_compute.emit_frame_size (<<<<<<< HEAD / ======= / >>>>>>>), which needs to be cleaned up before this can be merged.
- The LoongArch-specific wait loop in amdgpu_cs_submit() (`while (amdgpu_ih_fix_is_busy(p->adev)) msleep(20);`) can block indefinitely; consider adding a timeout or a more explicit wait mechanism so userspace submissions cannot hang forever if the IH fix work gets stuck.
- The LoongArch-specific multi-RELEASE_MEM fence emission (10x loop) is duplicated between GFX10 and GFX11 and significantly inflates packet counts; consider factoring this into a common helper to avoid divergence between implementations and keep the emit_frame_size accounting easier to maintain.
## Individual Comments
### Comment 1
<location path="drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c" line_range="7327" />
<code_context>
+ r = amdgpu_ring_init(adev, ring, 1024*2, &adev->gfx.eop_irq, irq_type,
</code_context>
<issue_to_address>
**issue (bug_risk):** Merge conflict markers are present in gfx_v11_0_ring_funcs_compute and will break the build.
The `gfx_v11_0_ring_funcs_compute` block still contains Git conflict markers (`<<<<<<< HEAD`, `=======`, `>>>>>>> ...`), which will stop this file from compiling. Please resolve the conflict by selecting one final `.emit_frame_size` implementation, then remove all conflict markers and any leftover `#endif`s.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Synchronizes a LoongArch-focused patchset intended to improve AMDGPU/Radeon interrupt (IH) ordering and fencing/cache-flush robustness, plus adjusts LoongArch defconfigs.
Changes:
- Add LoongArch-specific IH “fix” work + submission-side coordination (cs_lock) in AMDGPU.
- Re-enable/alter dummy EOP + cache flush/fence emission sequences for multiple AMDGPU/Radeon generations on LoongArch, including larger ring/frame-size budgets for GFX10/GFX11.
- Update LoongArch defconfigs (PSI default disabled, MIDI/UMP-related options, legacy ptys off).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| drivers/gpu/drm/radeon/cik.c | Adjusts dummy EOP sequence behavior on LoongArch; removes prior Loongson64 guard. |
| drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | LoongArch-specific dummy EOP fence payload changes; removes prior Loongson64 guard. |
| drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | Same dummy EOP fence adjustments for GFX v7; removes prior Loongson64 guard. |
| drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | Adds extra cache flush EOP on LoongArch and expands frame-size estimates. |
| drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | Doubles ring size on LoongArch, emits additional RELEASE_MEM packets, updates frame-size budgets. |
| drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | Doubles ring size on LoongArch, emits additional RELEASE_MEM packets, updates frame-size budgets. |
| drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | Adds LoongArch-only cs_lock to IRQ struct. |
| drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | Initializes LoongArch-only cs_lock. |
| drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | Extends IH ring struct with LoongArch-only lock/work fields; exports busy check helper. |
| drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | Implements LoongArch IH ordering “fix” logic and workqueue-based processing. |
| drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | Blocks CS submission while LoongArch IH fix is active. |
| arch/loongarch/configs/loongson3_defconfig | Enables PSI default disabled; updates sound/MIDI/UMP-related options; disables legacy ptys. |
| arch/loongarch/configs/deepin_loongarch_desktop_defconfig | Same config theme as loongson3 defconfig. |
Comments suppressed due to low confidence (1)
drivers/gpu/drm/radeon/cik.c:3550
- This change removes the previous
#ifndef CONFIG_MACH_LOONGSON64guard around the dummy EOP cache-flush workaround.CONFIG_MACH_LOONGSON64is still present in the tree and was explicitly called out as unstable under heavy I/O; re-enabling this path on Loongson64 (MIPS) looks like a regression. Consider restoring the guard forCONFIG_MACH_LOONGSON64while keeping theCONFIG_LOONGARCHhandling, or otherwise document why it is now safe.
/* Workaround for cache flush problems. First send a dummy EOP
* event down the pipe with seq one below.
*/
radeon_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
radeon_ring_write(ring, (EOP_TCL1_ACTION_EN |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int i; | ||
|
|
||
| for (i = 0; i < 10; i++) { | ||
| /* RELEASE_MEM - flush caches, send int */ |
| #ifdef CONFIG_LOONGARCH | ||
| r = amdgpu_ring_init(adev, ring, 1024*2, &adev->gfx.eop_irq, irq_type, | ||
| hw_prio, NULL); |
| <<<<<<< HEAD | ||
| 8 + /* gfx_v11_0_emit_mem_sync */ | ||
| 2, /* gfx_v11_0_ring_emit_cleaner_shader */ | ||
| ======= | ||
| #endif | ||
| 8, /* gfx_v11_0_emit_mem_sync */ | ||
| >>>>>>> df704acbdb160 (drm/amdgpu: Make eleven EOP packet for GFX10_0/GFX11_0 have real content) |
| while (amdgpu_ih_fix_is_busy(p->adev)) | ||
| msleep(20); |
| @@ -3561,9 +3554,12 @@ void cik_fence_gfx_ring_emit(struct radeon_device *rdev, | |||
| radeon_ring_write(ring, addr & 0xfffffffc); | |||
| radeon_ring_write(ring, (upper_32_bits(addr) & 0xffff) | | |||
| DATA_SEL(1) | INT_SEL(0)); | |||
| #ifdef CONFIG_LOONGARCH | |||
| radeon_ring_write(ring, fence->seq); | |||
| #else | |||
| radeon_ring_write(ring, fence->seq - 1); | |||
| radeon_ring_write(ring, 0); | |||
| #endif | |||
| radeon_ring_write(ring, 0); | |||
| @@ -6131,13 +6124,18 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, | |||
| EVENT_INDEX(5))); | |||
| amdgpu_ring_write(ring, addr & 0xfffffffc); | |||
| amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff) | | |||
| #ifdef CONFIG_LOONGARCH | |||
| DATA_SEL(write64bit ? 2 : 1) | INT_SEL(0)); | |||
| amdgpu_ring_write(ring, lower_32_bits(seq)); | |||
| amdgpu_ring_write(ring, upper_32_bits(seq)); | |||
| #else | |||
| DATA_SEL(1) | INT_SEL(0)); | |||
| amdgpu_ring_write(ring, lower_32_bits(seq - 1)); | |||
| amdgpu_ring_write(ring, upper_32_bits(seq - 1)); | |||
| @@ -2143,12 +2135,17 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, | |||
| EVENT_INDEX(5))); | |||
| restart_check: | ||
| if (!forever && ++check_cnt > 1) | ||
| return -ENAVAIL; | ||
|
|
||
| if (forever) | ||
| msleep(20); | ||
|
|
| @@ -2143,12 +2135,17 @@ static void gfx_v7_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, | |||
| EVENT_INDEX(5))); | |||
| amdgpu_ring_write(ring, addr & 0xfffffffc); | |||
| amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff) | | |||
| #ifdef CONFIG_LOONGARCH | |||
| DATA_SEL(write64bit ? 2 : 1) | INT_SEL(0)); | |||
| amdgpu_ring_write(ring, lower_32_bits(seq)); | |||
| amdgpu_ring_write(ring, upper_32_bits(seq)); | |||
| #else | |||
| DATA_SEL(1) | INT_SEL(0)); | |||
| amdgpu_ring_write(ring, lower_32_bits(seq - 1)); | |||
| amdgpu_ring_write(ring, upper_32_bits(seq - 1)); | |||
| int i; | ||
|
|
||
| for (i = 0; i < 10; i++) { | ||
| /* RELEASE_MEM - flush caches, send int */ |
| } | ||
| } | ||
|
|
||
| #ifdef CONFIG_LOONGARCH |
There was a problem hiding this comment.
可能需要再次明确一下:
所有用“CONFIG_LOONGARCH”括起来的workaround,在 3A/B4000 等平台真的不需要么?如果需要,应改为“MACH_LOONGSON”,无论是 6.18 还是 6.6 内核。
There was a problem hiding this comment.
可能需要再次明确一下:
所有用“CONFIG_LOONGARCH”括起来的workaround,在 3A/B4000 等平台真的不需要么?如果需要,应改为“MACH_LOONGSON”,无论是 6.18 还是 6.6 内核。
我不知道需不需要,原始补丁如此,不过我已经把所有amdgpu里面的LOONGARCH都换成MACH_LOONGSON64了。
There was a problem hiding this comment.
可能需要再次明确一下:
所有用“CONFIG_LOONGARCH”括起来的workaround,在 3A/B4000 等平台真的不需要么?如果需要,应改为“MACH_LOONGSON”,无论是 6.18 还是 6.6 内核。我不知道需不需要,原始补丁如此,不过我已经把所有amdgpu里面的LOONGARCH都换成MACH_LOONGSON64了。
之所以在这里矫情一下是因为我怀疑 6.6 也得这么改一下
The duplication of EOP packets for GFX10_0/GFX11_0, with the former one have seq written and the latter one have seq written, seems to confuse some hardware platform (e.g. Loongson 7A series PCIe controllers). Signed-off-by: wuqianhai <wuqianhai@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> [Conflict for drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c because of commit cb17fff ("drm/amdgpu/mes: remove unused functions") upstreamed] (cherry picked from commit df704ac) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Conflicts: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
…lures The legacy PTY driver (CONFIG_LEGACY_PTYS) is deprecated and not required for most modern systems. It causes unexpected behavior during gnulib tests, leading to test failures related to pseudo-terminal handling. Disabling CONFIG_LEGACY_PTYS removes the old /dev/pty* and /dev/tty* interfaces, allowing gnulib tests to run successfully with the modern /dev/pts/* subsystem. Signed-off-by: yangxiaojuan <yangxiaojuan@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> (cherry picked from commit 026f020) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Conflicts: arch/loongarch/configs/loongson3_defconfig
Like this: [ 130.469445] [ T759] do_page_fault(): sending SIGSEGV to pipewire for invalid read access from 000055560db01c91 [ 130.479455] [ T759] era = 00007fffeea0406c in libasound.so.2.0.0[7fffee940000+11c000] [ 130.487283] [ T759] ra = 00007fffeeafe990 in libspa-alsa.so[7fffeea80000+104000] [ 130.669705] [ T967] do_page_fault(): sending SIGSEGV to pipewire for invalid read access from 00005555619e4000 [ 130.679771] [ T967] era = 00007fffee88e9ac in libspa-alsa.so[7fffee810000+104000] [ 130.687229] [ T967] ra = 00007fffee88e9a0 in libspa-alsa.so[7fffee810000+104000] Signed-off-by: yangxiaojuan <yangxiaojuan@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> (cherry picked from commit f1cc6c0) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Conflicts: arch/loongarch/configs/loongson3_defconfig
7bd1ea5 to
15aab63
Compare
deepin inclusion category: feature Reported-by: WangYuli <wangyuli@aosc.io> Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
#1519
Summary by Sourcery
Improve AMDGPU/Radeon interrupt handling and fencing reliability on LoongArch platforms by adding LoongArch-specific IH ordering fixes, cache flush sequences, and ring size adjustments.
Bug Fixes:
Enhancements: